Impact of community structure on information transfer 
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The observation that real complex networks have internal structure has important implication 
for dynamic processes occurring on such topologies. Here we investigate the impact of community 
structure on a model of information transfer able to deal with both search and congestion simul- 
taneously. We show that networks with fuzzy community structure are more efficient in terms of 
packet delivery that those with pronounced community structure. We also propose an alternative 
packet routing algorithm which takes advantage of the knowledge of communities to improve in- 
formation transfer and show that in the context of the model an intermediate level of community 
structure is optimal. Finally, we show that in a hierarchical network setting, providing knowledge of 
communities at the level of highest modularity will improve network capacity by the largest amount. 

PACS numbers: 89.75.Hc, 87.23.Ge 
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I. INTRODUCTION 

The continuing intensity that accompanies tlic study 
of complex networks has led to many important contri- 
butions in a variety of scientific disciplines (for a recent 
review see Specifically, the study of transport prop- 
erties of networks is becoming increasingly important due 
to the constantly growing amount of information and 
commodities being transferred through them. A partic- 
ular focus of these studies is how to make the capacity 
of the network maximal while minimising the delivery 
time. Both network packet routing strategies and net- 
work topology play essential parts in traffic flow in net- 
works. 

Traditionally routing strategies have been based on the 
idea of maintaining routing tables of the best approxima- 
tion of the shortest paths between nodes. In realistic set- 
tings, however, the knowledge that any one of the nodes 
has about the topology of the network will be incom- 
plete. So, much of the focus in recent studies has been 
on searchability. In particular, distributed search using 
only local information has been shown to be efficient in 
spatially embedded networks 0, Q . Networks with scale- 
free degree distributions are particularly navigable using 
local search strategies due to the presence of highly con- 
nected hubs [1]. 

However, when the number of search problems the net- 
work is trying to solve increases, it raises the problem of 
congestion at central nodes. It has been observed, both 
in real world networks lal and in model communication 
networks @, 0, H, i, [3, that the networks col- 

lapse when the load is above a certain threshold and the 
observed transition can be related to the appearance of 
the 1// spectrum of the fluctuations in Internet flow data 

Mm 

These two problems, search and congestion, that have 
so far been analysed separately in the literature can be in- 
corporated in the same communication model. Previous 
work has contributed a collection of models that capture 
the essential features of communication processes and are 



able to handle these two important issues simultaneously 
d, [l^, [l^ . In these models, agents are nodes of a net- 
work and can interchange information packets along links 
in the network. Each agent has a certain capability that 
decreases as the number of packets to deliver increases. 
The transition from a free phase to a congested phase 
has been studied for different network architectures in 
[1, [lot ' whereas in [ist the cost of maintaining communi- 
cation channels was considered. 

The topology of the network also plays a central part in 
communication processes. In [l^ the problem of finding 
optimal network topologies for both search and conges- 
tion for a fixed number of nodes and links was tackled. 
It was found that in the free regime, highly centralised 
topologies facilitating search are optimal, whereas in 
the congested regime decentralised topologies which dis- 
tribute the packet load between nodes are favoured. It 
has been shown that shortest path routing algorithms 
are not optimal for scale free networks due to the pres- 
ence of communication bottlenecks fv^ and several al- 
ternative routing strategies have been proposed to take 
advantage of the scale-free nature of complex networks 
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On the other hand, many networks found in nature 
have been observed to have a modular or community 
structure. Communities are those subsets of nodes that 
are more densely linked internally than to the rest of 
the network. Identifying communities in networks has 
become a problem which has been tackled by many re- 
searchers in recent years (see for example [2^.[23l[2^. and 
for reviews see (25l. [26j). Furthermore, communities are 
often organised in a hierarchical way [H, [13, [H, [H, [13] ■ 
That is, large communities are often comprised of several 
smaller communities. Despite all these efforts, the impact 
that community structure has on information transfer 
has not been considered. 

The aim of this paper is two-fold: firstly, we will in- 
vestigate the effect that community structure has on the 
model of search and congestion, and secondly we will pro- 
pose an alternative routing strategy and demonstrate its 
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impact in the presence of community structure. In the 
next section we will describe the model and recall the 
most important analytical results. In Section II we will 
consider the effect that a modular structure of varying 
strength has on the behaviour of the model. We will then 
show how knowledge of this community structure can be 
taken advantage of to improve transport processes in net- 
works. And in the final section, we give some concluding 
remarks. 



II. COMMUNICATION MODEL 

The communication model considers that the informa- 
tion flowing through the networks is formed by discrete 
packets sent from an origin node to a destination node. 
Each node is an independent agent that can store as 
many packets as necessary. However, to have a realis- 
tic picture of communication we must assume that the 
nodes have a finite capacity to process and deliver pack- 
ets. That is, a node will take longer to deliver two pack- 
ets than just one. A particularly simple example of this 
would be to assume that nodes are able to deliver one 
(or any constant number) information packet per time 
step independent of their load, as in the model of decen- 
tralised information processing in firms of Radner 1311 
and in simple models of computer queues 0, 0, S Il5j |. 
but note that many alternative situations are possible. 

In the present model, each node has a certain ability 
to deliver packets which is limited. This limitation in the 
ability of agents to deliver information can result in con- 
gestion of the network. When the amount of information 
is too large, agents are not able to handle all the packets 
and some of them remain undelivered for extremely long 
periods of time. The maximum amount of information 
that a network can manage before collapse gives a mea- 
sure of the quality of its organisational structure. In this 
study, the interest is focused on when con gest ion occurs 
depending on the topology of the network [l5|. 

The dynamics of the model is as follows. At each 
time step t, an information packet is created at every 
node with probability p. Therefore p is the control pa- 
rameter: small values of p correspond to low density of 
packets and high values of p correspond to high density 
of packets. When a new packet is created, a destina- 
tion node, different from the origin, is chosen randomly 
in the network. Thus, during the following time steps 
t + 1, t + 2, . . . , t + T, the packet travels toward its des- 
tination. Once the packet reaches the destination node, 
it is delivered and disappears from the network. 

The time that a packet remains in the network is re- 
lated not only to the distance between the source and 
the target nodes, but also to the amount of packets in 
its path. Nodes with high loads — i.e. high quantities of 
accumulated packets — will take longer to deliver pack- 
ets or, in other words, it will take more time steps for 
packets to cross regions of the network that are highly 
congested. In particular, at each time step, all the pack- 



ets move from their current position, i, to the next node 
in their path, j, with a probability pij. This probability 
Pij is called the quality of the channel between i and j. In 
this paper, we take the special case that each node is able 
to send one packet at each time step. It is important to 
note, however, that the model is not deterministic. Here, 
a packet which is waiting at a particular node, will be 
sent with equal probability as any other packet waiting 
at the same node. 

The packets in the present model have a limited radius 
of knowledge, that is, they are able to determine whether 
a node within a certain distance r is the destination node. 
In this case, the packet takes the shortest possible route 
to the destination, otherwise, it travels down a link cho- 
sen at random. In this paper we set r = 1, so that only 
nearest neighrbours are recognised. It has been shown 
in previous work that in the free phase, there is no accu- 
mulation at any node in the network and the number of 
packets that arrive at node j is, on average, pBj/ (S* — 1), 
where Bj is the effective betweenness of node j which 
is defined as the fractional number of paths that pack- 
ets take though node j and S is the number of nodes 
in the network. A particular node will collapse when 
pBj/{S — 1) > 1 and the critical congestion point of the 
network will be 



where B* is the maximum effective betweenness in the 
network, that corresponds to the most central node 

If the routing algorithm is Markovian, which is the 
case here, it is possible to estimate Bj analytically. The 
search and congestion process can be formulated as a 
Markov chain, which is dependant on the packet transi- 
tion probability matrix. This matrix is derived from the 
adjacency matrix of the network, the radius of knowl- 
edge r, and the search algorithm. Using this formulation, 
B^ of each node can be calculated analytically for any r 
[l(j |. In these cases, the paths the packets take will not 
be shortest paths. As the radius of knowledge increases, 
Bj converges to shortest path betweenness and will be 
equal to it when r is greater or equal to the diameter of 
the network. 



III. PACKET DYNAMICS OF 
COMMUNICATION MODEL ON NETWORKS 
WITH COMMUNITY STRUCTURE 

The model from [10] can be further exploited to look 
at the effects that community structure has on dynamics. 
To this end we need to be able to construct networks 
with controllable community structure. We choose to 
use a family of pseudo-random networks since all other 
properties (such as node degree and clustering) will be 
equivalent to fully random networks. The only thing that 
we will vary is the strength of community structure. 
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First we employ the networks proposed in [32| . These 
networks are comprised of 128 nodes which are spht into 
four communities of 32 nodes each. Pairs of nodes belong- 
ing to the same community are linked with probability 
Pirn whereas pairs belonging to different communities are 
joined with probability pout ■ The value of pm is chosen so 
that the average number of links a node has to members 
of any other community, Zin, can be controlled. While 
Pin (and therefore Zin) is varied freely, the value oi Pout 
is chosen to keep the total average node degree, k, con- 
stant, and set to 16. As Zin is increased from zero, the 
communities become better defined and easier to identify. 

To address the question of hierarchical structure we 
use a generalisation of the model of generation of net- 
works with community structure that includes two hier- 
archical levels of communities as introduced in • The 
graphs are generated as follows: in a set of 256 nodes, 
16 compartments are prescribed that will represent our 
first community organisational level. Each of these sub- 
communities contains 16 nodes each. Furthermore, four 
second level communities are prescribed, each containing 
four sub-communities, that is 64 nodes each. The inter- 
nal degree of nodes at the first level Zin-^ and the internal 
degree of nodes at the second level Zi„2 are constrained 



to keep an average degree Zi^ + Zi, 
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now on, networks with two hierarchical levels are indi- 
cated as Zin-^ - Zin2 , e.g. a network with 13-4 means 13 
links with the nodes of its first hierarchical level commu- 
nity (more internal) , 4 links with the rest of communities 
that form the second hierarchical level (more external) 
and 1 link with any community of the rest of the net- 
work. 

As a simple measure of structural efficiency of the net- 
work in terms of packet transport, we can consider the 
number of packets present in the network. We allow the 
dynamics to reach a steady state, which we detect by 
considering the rate at which the number of packets in- 
creases in the system. Once this rate becomes small, 
fluctuating around 0, we have reached the end of the 
transient. It is important to note that when p > pc the 
system never reaches a steady state, the mean number 
of packets keeps growing linearly with time, and the rate 
never becomes very small. We also average over several 
realisations, since the number of packets in the system is 
subject to statistical fluctuations. 



A. Original communication model 

First of all we simulate the dynamics of the model de- 
scribed above, in which the packets have no knowledge 
of the topology of the network at the level of community 
structure. Introducing community structure in the net- 
work topology over which the dynamics occur increases 
the traffic load on the nodes which connect communities. 
This is in agreement with the finding that cutting links 
with the highest betweenness separates communities [2^ . 
It follows that the effective betweenness of the nodes at 



each end of the bridge links will also be increased. As a 
result, the capacity of the network to deliver packets is 
reduced in function of how fuzzy the community struc- 
ture is. 
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FIG. 1: (colour online) Each point represents the number of 
packets averaged over 100 realisations in the steady state of 
the dynamics, (a) Number of floating packets as a function 
of p using the original search algorithm in networks with one 
level of community structure. The different colours denote 
varying levels of community strength as controlled by the pa- 
rameter Zin and the vertical lines correspond to the analytical 
prediction of the onset of congestion (Section [III equation [TJ . 
(b) Onset of congestion pc for varying Zin. 

From Fig. [1] we can see that the analytical calculation 
from Section ini of the onset of congestion pc agrees very 
well with the point at which the number of floating pack- 
ets diverges. As the strength of community structure is 
increased by raising Zin, Pc is reduced. This seems logi- 
cal, since the origin and destination of packets are chosen 
at random. It follows that the probability of creating a 
packet with both origin and destination in one commu- 
nity is 1 /4. All other packets will necessarily have to pass 
through at least one central, " bridge" node that connects 
two communities. This leads to an increase in the number 
of packets that pass through bridge nodes, increasing its 
effective betweenness. As a result of receiving a dispro- 
portionate amount of packets, these nodes will collapse 
at lower values of p, leading to a cascade of collapses 
throughout the network. This effect becomes more and 
more pronounced as Zj„ increases, so, the stronger the 
community structure, the lower p^. 

In the case of hierarchical networks, we concentrate on 
three different network topologies which are particularly 
instructive, 13-4, 14-3 and 15-2. Once again the ana- 
lytical calculation corresponds very well to the point at 
which the number of floating packets diverges, see Fig. [2l 
It is worth noting that these three networks have almost 
the same pc- This is due to the fact that the average 
number of links per node between communities of size 64 
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is constant and set to 1. What is varied is the strength of 
the intermediate and innermost level of community struc- 
ture. In the case of the original communication model, 
this shows little effect. 
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FIG. 2: (colour online) Number of floating packets as a func- 
tion of p using the original search in networks with hierar- 
chical community structure on two levels. The vertical lines 
correspond to analytical prediction of the onset of congestion. 




FIG. 3: (colour online) Comparison of original search al- 
gorithm with the modified for the same networks with com- 
munity structure. The number of floating packets is plotted 
against p. The four panels depict four networks with varying 
community strength controlled by the parameter Zi„, and the 
red points show the original search algorithm and the black 
points denote the modified algorithm incorporating commu- 
nity information. 



B. Modifying the communication model 

Clearly, networks with strong community structure are 
less efficient at delivering packets which are oblivious to 
the underlying topology. But, what happens when we 
give the search process some information about the com- 
munity structure? To address this question we propose 
a simple modification of the way packets are transferred 
between nodes. 

Let us consider a packet generated at node i in com- 
munity Ci with destination node j in community Cj. At 
each step in its path, the packet is given information of 
the community of neighbouring nodes. Should the packet 
destination community be the same as that of any of the 
neighbours of the node that is processing the packet, the 
packet is sent to one of those neighbours, otherwise it is 
sent down a link chosen at random. In this way, packets 
are able to arrive at the destination community without 
necessarily arriving at the destination node. The idea is 
that once within the destination community, finding the 
destination node is easier. 

In Figure [3] we plot the number of floating packets in 
the network at the steady state against the packet injec- 
tion rate p. The dynamics are performed on networks 
with ad-hoc community structure of varying strength, 
controlled by the parameter Zi„, the average number of 
links internal to the community. When Zm = 4 the net- 
work is equivalent to an Erdos-Renyi random graph with 
128 nodes and 16 links per node. In this scenario, the 
original search algorithm performs much better in terms 
of ability to deliver packets. This seems logical: giving 
packets information about communities which are not 
present will not improve the packet's ability to find the 



destination node. Indeed, for lower values of Zin, this 
information is detrimental to efficiency, since the pre- 
defined partitions of the network actually contain fewer 
internal links, compared to external ones. In this sce- 
nario, packets are often sent to regions of the network 
which are less likely to contain the destination node. This 
is highlighted in Figure |4|3. where we see that for very 
low values of Zin the original search algorithm collapses 
the network at much higher values of p than the modified 
algorithm. 

When the strength of the community structure is in- 
creased, the modified search algorithm improves the effi- 
ciency of the network considerably. For Zm > 8 [sJ] , the 
onset of congestion in terms of p is considerably higher 
for the modified search algorithm, and the same network 
is much more efficient at delivering packets for all values 
of Zin > 8. In other words, the modified algorithm is 
able to find more efficient routes to deliver packets and 
the network is able to handle a much higher load. 

In the modified search algorithm, the calculation from 
Sectionim (equationfTj) is still valid, however, the analytic 
calculation of B* is more involved than in [l3|. Never- 
theless we can estimate pc of the network by looking at 
the point where the number of floating packets diverges. 
In Figure [3^, pc is estimated in this fashion. When the 
communities are extremely well defined, say Zin — 15, 
flow through the network is restricted. So even though 
the search method of the packets is greatly improved, 
and they are able to find the correct community in a 
short number of steps, flow is restricted by the forma- 
tion of bottlenecks at the interface between two com- 
munities. It emerges that an intermediate community 
structure strength, Zin = 12 shows optimal efficiency in 
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FIG. 4: (colour online) Onset of collapse for the modi- 
fied search algorithm as applied to networks with community 
structure at a single level, (a) Number of floating packets as 
a function of p for the modified search. Different colours de- 
note varying community strengths, controlled by parameter 
Zin and the vertical lines now denote the estimate of the onset 
of congestion pc- (b) pc as a function of internal connectiv- 
ity Zin for both the original model, r = 1, and the modified 
model, r — 1 com. 



terms of pc- This suggests that for the flow to be opti- 
mal there must be a balance between internal strength 
of communities and connections to other communities. 

For the case of networks with hierarchical community 
structure as described above, community information can 
now be given at two levels. The packets can be given 
information about the community structure on the first 
level, that is, they are given knowledge about which com- 
munity of the four communities of size 64 the destination 
node belongs to. From here on, this is denoted as i = 4. 
Alternatively, we can give nodes information on the sec- 
ond level of community structure, so that packets know 
which one of the 16 communities of size 16 the destina- 
tion node belongs to, which we denote i = 16. 

Once information about community structure is given 
to the packets, the efficiency of the network to deliver 
these is increased considerably as in the case of single 
level community structure. The level of community in- 
formation which increases the efficacy of information flow 
by the largest amount is dependent on the topology of the 
network. Compared with no community information be- 
ing given to the packets, i — 16 increases the values of 
Pc almost fivefold, in all three networks. In the case of 
13-4, Pc is increased from 0.0132 to 0.064. A stronger 
community structure at the second level, 14-3 and 15-2 
does not make much of an impact when i ^ 16, with pc 
being 0.063 for both. 

However, when information is given at the interme- 
diate level of community structure, i = 4, the differ- 
ences become more apparent. For the 13-4 configuration, 
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FIG. 5: (colour online) Number of fioating packets in net- 
works with hierarchical community structure. The number of 
fioating packets is show as a function of p for three slightly 
different hierarchical networks, (a) 13-4 (b) 14-3 (c) 15-2. For 
each, the level of community hierarchy information packets 
are given is varied between no information given, i = 0, in- 
formation given at the first level i = 4 and information given 
at the second level i = 16. The vertical lines represent the 
analytical calculation as in Section |ll] for the original search 
algorithm. 



community information at this level favours information 
diffusion more, with pc being 0.071, higher than in the 
i = 16 case. However in the case of the 14-3 network, 
the opposite is true: giving information at the alterna- 
tive, z = 16 level is (marginally) more beneficial. For the 
15-2 network, giving more precise information causes a 
considerable improvement. 

It is interesting to compare these results with other 
topological characterisations of complex networks. In 
particular, the most common measure related to commu- 
nity structure is the modularity measure, Q, proposed in 
[32] which measures the quality of a particular partition 
of a particular network. It is defined as follows: 



where the element i,j of the matrix e represents the 
fraction of links between communities i and j and — 
Cij. This value can also be measured at two levels. 
One is at the first level of the hierarchy, where nodes are 
grouped in 4 communities of 64 nodes each, which corre- 
sponds to the i = 4 case. The other, corresponding to the 
i — 16 case, is considering that the nodes are grouped in 
16 communities of 16 nodes each. In the three networks 
we are considering, we only vary the strength of the sec- 
ond level of community structure, so for the i = A case, 
the value of modularity remains constant. For the i — 16 
case however, the value of Q varies with the strength of 
the second level. For the 13-4 network, the first level of 
community structure is a better partition in terms of Q, 
whereas for 14-3 and 15-2 the second level is a better 
partition. See Table |T] for values. 

For 13-4, where the best partition is found at the first 
level of community structure j = 4, giving packets in- 
formation about the same level improves the efficiency 
of the network more than giving information at the sec- 
ond level. For 14-3 and 15-2 the opposite is true: in 
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i = 16 


i = 4 


i = 16 


i = 4 


13-4 


0.0132 


0.064 


0.071 


0.660 




0.695 


14-3 


0.0118 


0.063 


0.061 


0.726 




0.695 


15-2 


0.0113 


0.063 


0.053 


0.771 




0.695 



TABLE I: Table of values of the onset of collapse pc in hierar- 
chical networks and the values of modularity of the same for 
two levels of grouping. 

both cases the best partition is found at the second level, 
i = 16, and the best flow in terms of pc is found when 
giving information about the same level. In other words 
the two coincide. This means that if communities are or- 
ganised in a hierarchical fashion, it is always best to give 
information at the level where the maximum modularity 
is found. 



IV. CONCLUSIONS 

In this paper we have taken advantage of a model incor- 
porating search and congestion simultaneously to inves- 
tigate the impact that community structure has on infor- 
mation transport. We have shown that transport is com- 
promised when community structure is introduced in the 
network since community structure implies the presence 



of bottlenecks. In fact, the better defined the commu- 
nities are, the more affected packet transport becomes. 
We have also shown that transport can be dramatically 
improved by providing packets with information about 
the community structure. And finally we have shown 
that the largest improvements are found when the par- 
tition with the largest modularity is used to provide the 
information. 

This suggests that it is possible to infer a priori what 
kind of information should be given to packets to opti- 
mise packet transport, just by identifying the commu- 
nity structure. By finding the communities at the level 
of highest modularity, and providing information at this 
level, packet transport appears to be optimal. The ques- 
tion remains: is this is always the case? It certainly 
seems possible to improve information transfer on an ar- 
bitrary network just by providing the search algorithm 
information about the community structure at the level 
of highest modularity. Since maximising the modular- 
ity measure is NP hard [33| . all community detection 
algorithms that depend on maximising modularity are 
heuristic approximations and as such different identifica- 
tion algorithms find different partitions with varying val- 
ues of optimal modularity for real networks. The results 
here suggest that giving information about the commu- 
nity structure as found by the most accurate algorithms 
would be best. But this remains to be shown in the case 
of real networks. 
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